Normalization of zero-inflated data: An empirical analysis of a new indicator family and its use with altmetrics data
نویسندگان
چکیده
Recently, two new indicators (Equalized Mean-based Normalized Proportion Cited, EMNPC; Mean-based Normalized Proportion Cited, MNPC) were proposed which are intended for sparse scientometrics data. The indicators compare the proportion of mentioned papers (e.g. on Facebook) of a unit (e.g., a researcher or institution) with the proportion of mentioned papers in the corresponding fields and publication years (the expected values). In this study, we propose a third indicator (Mantel-Haenszel quotient, MHq) belonging to the same indicator family. The MHq is based on the MH analysis - an established method in statistics for the comparison of proportions. We test (using citations and assessments by peers, i.e. F1000Prime recommendations) if the three indicators can distinguish between different quality levels as defined on the basis of the assessments by peers. Thus, we test their convergent validity. We find that the indicator MHq is able to distinguish between the quality levels in most cases while MNPC and EMNPC are not. Since the MHq is shown in this study to be a valid indicator, we apply it to six types of zero-inflated altmetrics data and test whether different altmetrics sources are related to quality. The results for the various altmetrics demonstrate that the relationship between altmetrics (Wikipedia, Facebook, blogs, and news data) and assessments by peers is not as strong as the relationship between citations and assessments by peers. Actually, the relationship between citations and peer assessments is about two to three times stronger than the association between altmetrics and assessments by peers.
منابع مشابه
Field- and time-normalization of zero-inflated data: An empirical analysis using citation and Twitter data
Thelwall (2017a, 2017b) proposed a new family of fieldand time-normalized indicators, which is intended for sparse data. These indicators are based on units of analysis (e.g., institutions) rather than on the paper level. They compare the proportion of mentioned papers (e.g., on Twitter) of a unit with the proportion of mentioned papers in the corresponding fields and publication years (the exp...
متن کاملNormalization of zero-inflated data: An empirical analysis of a new indicator family
Recently, two new indicators (Equalized Mean-based Normalized Proportion Cited, EMNPC, the Mean-based Normalized Proportion Cited, MNPC) were proposed which are intended for sparse data. We propose a third indicator (Mantel-Haenszel quotient, MHq) belonging to the same indicator family. The MHq is based on the MH analysis – an established method for polling the data from multiple 2×2 contingenc...
متن کاملZero-inflated negative binomial modeling, efficiency for analysis of length of maternity hospitalization
Background: Mothers’ delivery is one of the most common hospitalization factors throughout the world and it’s modeling can explain distribution and effective factors on rising and decreasing of it. The objective of the present study was a suitable modeling for mother hospitalization time and comparing it with different models. Materials & Methods: Present study is an observational and cross-s...
متن کاملA New Class of Zero-Inflated Logarithmic Series Distribution
Through this paper we suggest an alternative form of the modified zero-inflated logarithmic series distribution of Kumar and Riyaz (Statistica, 2013) and study some of its important aspects. The method of maximum likelihood is employed for estimating the parameters of the distribution and certain test procedures are considered for testing the significance of the additional parameter of the model. ...
متن کاملHurdle, Inflated Poisson and Inflated Negative Binomial Regression Models for Analysis of Count Data with Extra Zeros
In this paper, we propose Hurdle regression models for analysing count responses with extra zeros. A method of estimating maximum likelihood is used to estimate model parameters. The application of the proposed model is presented in insurance dataset. In this example, there are many numbers of claims equal to zero is considered that clarify the application of the model with a zero-inflat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1712.02228 شماره
صفحات -
تاریخ انتشار 2017